Insights into the Pathology of Breast Cancer by Analysis of Clinicopathological Features
Ashley Williams
Breast cancer is a disease in which abnormal breast cells grow out of control and form tumors. If left unchecked, the tumors can spread throughout the body(metastasize) and become fatal.
Breast cancer is a multigenic disease, meaning it is caused by a combination of genetic and environmental factors. Breast cancer has a strong hereditary component, caused by mutations in the genes BRCA1 and BRCA2. The interaction between genetic predispositions and environmental factors is crucial in understanding the individual risk of developing breast cancer.
Breast cancer cells begin inside the milk ducts and/or the milk-producing lobules of the breast. The earliest form (in situ) is not life-threatening and can be detected in early stages. Cancer cells can spread into nearby breast tissue (invasion). This creates tumors that cause lumps or thickening.
Invasive cancers can spread to nearby lymph nodes or other organs (metastasize). Metastasis can be life-threatening and fatal.
Treatment is based on the person, the type of cancer and its spread. Treatment combines surgery, radiation therapy and medications.
In 2022, there were 2.3 million women diagnosed with breast cancer and 670,000 deaths globally. Breast cancer occurs in every country of the world in women at any age after puberty but with increasing rates in later life.
Source: World Health Organization
The research questions I’d like to address throughout this project using this data set includes the following:
I am motivated to investigate these variables due to my interest in biology.
I obtained my data for this project from Kaggle.
Age Analysis
Grade Analysis
Tumor Size Analysis
Estrogen Status Analysis
Progesterone Status Analysis
Survival Months Analysis
Status Analysis
There does not seem to be a relationship between Age and Survival Months. Patients who lived through this study have higher average survival months compared to those who died, but this is not surprising.
While the average and median ages of patients who died is slightly higher than those who lived, it is not by a significant amount.
A higher percentage of patients whose cancer is negative for estrogen receptors died during the length of this study compared to patients who are positive for estrogen receptors.
Patients who were positive for estrogen receptors had considerably higher average and median survival months compared to those who were negative for them. We see this same pattern for progesterone receptors.
The most interesting thing to note here is that among the patients who died, those were were positive for estrogen receptors have a higher median amount of survival months compared to those who are negative for them. Those who lived have comparable medians for both +/- estrogen receptors.
We see the same pattern here as we did for the estrogen receptors.
Tumor size increases with cancer grade.
Tumors that are - for estrogen or progesterone receptors have larger average and median tumor sizes compared to those that are +.
There seems to be no relationship between tumor size and age.
Tumors that are - for estrogen receptors are more frequently higher grade tumors. This same pattern is seen for progesterone receptors.
This sections aims to combine insights gained from each research question together for a more complete approach towards breast cancer in patients
There is a much higher disparity between tumor size amongst ER- patients compared to ER+. Patients who died have a higher median tumor size, and ER- patients who died had the highest median.
When comparing both ER and PR among patients in relation to their status, we see that “double negative” ER-/PR- patients experience the highest frequency of death, patients negative for only one of the HRs are equally less frequent than the double negative, and the double positive patients experienced the lowest frequency of death.
When comparing cancer status by patient HR statuses, we see that double negative patients have the highest incidency of higher grade cancers.
Interestingly, ER-/PR+ patients have the highest incidency of Grade IV cancer
When we compare cancer grade to status, we see the frequency of death amongst patients increasing with each cancer grade.
To address the first research question, How do individual factors such as age, estrogen status, and progesterone status relate to breast cancer, through the patient’s status and their survival after diagnosis?
We saw that age was not related to patient status or survival after diagnosis, and age is not a sufficient predictor of either.
We saw that ER- and PR- patients showed similar results in the patient’s well being, as HR- patients had a higher frequency of death compared to HR+ patients. Additionally, among patients who died during the study, we saw that HR- patients had a lower median amount of survival months compared to their HR+ counterparts.
To address the second research question, How do these factors affect the individual’s cancer profile, tumor nodule size and tumor grade?
We saw that there was no relationship between age and either tumor size or grade. Thus, it is not a sufficient predictor of cancer morphology.
We saw that HR- patients have larger median tumor sizes compared to their HR+ counterparts, and HR- patients were more frequently diagnosed with higher grade cancers. This connection makes sense, as higher cancer grade is associated with larger tumor sizes.
When connecting these ideas, we saw that double negative patients were almost 4x as likely to have died during the study compared to the double positive patients.
In conclusion, Age is not a sufficient predictor of cancer morphology or patient longevity, but both Estrogen and Progesterone receptor presence is related to both cancer morphology and patient longevity.
The results we saw supporting these conclusions make sense, as it is accepted throughout the field of oncology and cancer research that tumors that are negative for hormone receptors, like estrogen and progesterone, are more dangerous and aggressive cancers. This is because they tend to grow and spread more rapidly, and in addition they are resistant to treatments such as hormone therapies that are effective for HR+ cancers.
The increase in fatality we saw for the double negative patients also is consistent with studies around breast cancer, as it is known the most dangerous type of breast cancer is the triple negative variety, ER-,PR-,HER2- (HER2 being a protein that regulates cell growth and division).
This study did not address the presence of HER2, which would be a good additional direction to gain insights into how many of these patients were triple negative rather than double negative, and to see how that difference would impact their prognosis as well as cancer morphology.
This study did not address treatments the patients may have been receiving, which could’ve made a difference in their survival months or final status. A good future direction for this study would be to analyze these variables we’ve found in conjunction with their treatments to establish relationships there.
Next, this dataset did not acknowledge sex of the patients which may be important in the manifestation of the disease. While breast cancer almost entirely affects female patients, male patients can also be afflicted with breast cancer. Although a small sample size, this would be interesting to analyze in conjuction with the variables analyzed in this study.
Finally, I would be interested in seeing a genetic analysis dataset behind breast cancer patients to understand what mutations are causing different kinds of breast cancer.
---
title: "BC Analysis"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: default
navbar-bg: "lightpink"
orientaion: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(DT)
library(tidyverse)
library(plotly)
library(dplyr)
data <- read.csv("Breast_Cancer.csv")
data$Grade <- recode(data$Grade,
'1'="Grade I",
'2'="Grade II",
'3'="Grade III",
' anaplastic; Grade IV'="Grade IV")
```
Background
===
Column {.tabset data-width=500}
-----------------------------------------------------------------------
### Title
<font size=8><b><span Style="color:#e9578a">Insights into the Pathology of Breast Cancer by Analysis of Clinicopathological Features</span></b></font>
<font size=6><b><span Style="color:#FA8072">Ashley Williams</span></b></font>
### <span Style="color:#b4005d">Breast Cancer</span>
<span Style="color:#FA8072">Breast cancer is a disease in which abnormal breast cells grow out of control and form tumors. If left unchecked, the tumors can spread throughout the body(metastasize) and become fatal.</span>
- <span Style="color:#9d0f0f">Breast cancer is a multigenic disease, meaning it is caused by a combination of genetic and environmental factors. Breast cancer has a strong hereditary component, caused by mutations in the genes <i>BRCA1</i> and <i>BRCA2</i>. The interaction between genetic predispositions and environmental factors is crucial in understanding the individual risk of developing breast cancer.</span>
- <span Style="color:#FA8072">Breast cancer cells begin inside the milk ducts and/or the milk-producing lobules of the breast. The earliest form (in situ) is not life-threatening and can be detected in early stages. Cancer cells can spread into nearby breast tissue (invasion). This creates tumors that cause lumps or thickening.</span>
- <span Style="color:#9d0f0f">Invasive cancers can spread to nearby lymph nodes or other organs (metastasize). Metastasis can be life-threatening and fatal.</span>
- <span Style="color:#FA8072">Treatment is based on the person, the type of cancer and its spread. Treatment combines surgery, radiation therapy and medications.</span>
- <span Style="color:#9d0f0f">In 2022, there were 2.3 million women diagnosed with breast cancer and 670,000 deaths globally. Breast cancer occurs in every country of the world in women at any age after puberty but with increasing rates in later life.</span>
<span Style="color:#FA8072">Source: [World Health Organization](https://www.who.int/news-room/fact-sheets/detail/breast-cancer)</span>
### <span Style="color:#b4005d">Research Questions</span>
The research questions I’d like to address throughout this project using this data set includes the following:
- <span Style="color:#FA8072">How do individual factors such as age, estrogen status, and progesterone status relate to breast cancer, through the patient’s status and their survival after diagnosis?</span>
- <span Style="color:#CD5C5C">How do these factors affect the individual’s cancer profile, tumor nodule size and tumor grade?</span>
### <span Style="color:#b4005d">Variables of Interest</span>
- The data contains the following variables that are of interest for this project:
- <span Style = "color:#ff467e">Age</span> is the age of the individual upon diagnosis.
- <span Style = "color:#b10c3e">Grade</span> is the grade of the tumor, which refers to how severe the cancer is.
- <span Style = "color:#ff467e">Tumor Size</span> is the exact size of the tumor in millimeters.
- <span Style = "color:#b10c3e">Estrogen Status</span> is whether the tumor cells have estrogen receptors.
- <span Style = "color:#ff467e">Progesterone Status</span> is whether the tumor cells have progesterone receptors.
- <span Style = "color:#b10c3e">Survival Months</span> is how many months the patient survived after diagnosis.
- <span Style = "color:#ff467e">Status</span> is whether or not the patient is alive or dead upon the conclusion of this study.
### <span Style="color:#b4005d">Motivations</span>
I am motivated to investigate these variables due to my interest in <span Style="color:green">biology</span>.
- <span Style="color:#cd516f">Throughout my academic career, I have developed an interest in evolutionary developmental biology, and the genetic/cellular aspects that dictate related processes.</span>
- <span Style="color:#8c2240">Thus, the hormone receptor variables were of particular interest to me personally, and I would find it very interesting to investigate the interplay between the molecular signalling cascades that hormone receptors play a role in and flesh out their associations to aspects of breast cancer.</span>
- <span Style="color:#DE3163">I also think it would be a good direction for my project to relate these findings of the more molecular side of cancer to a more individual feature like age.</span>
- <span Style="color:#8c2240">And finally, I am currently taking the course Genetics of Human Disease and I’ve learned a lot about Breast Cancer. I think it would be interesting to dig further into it in the form of actual patient data and analyze some trends of the clinical manifestations of the disease.</span>
### <span Style="color:#b4005d">Data Source & Cleaning</span>
<span Style="color:#CD5C5C">I obtained my data for this project from [Kaggle](https://www.kaggle.com/datasets/reihanenamdari/breast-cancer).</span>
- <span Style="color:#CD5C5C">About the dataset</span>
- <span Style="color:#9d0f0f">This dataset of breast cancer patients was obtained from the 2017 November update of the SEER Program (Surveillance, Epidemiology, and End Results) of the NCI, which provides information on population-based cancer statistics. </span>
- <span Style="color:#9d0f0f">The dataset involved female patients with infiltrating duct and lobular carcinoma breast cancer diagnosed in 2006-2010.</span>
- <span Style="color:#CD5C5C">Data Cleaning</span>
- <span Style="color:#9d0f0f">Patients with unknown tumor size and patients whose survival months were less than 1 month were excluded; thus, 4024 patients were ultimately included.</span>
- <span Style="color:#9d0f0f">I additionally used the statistical software JMP to further clean the data, even though it was pre-cleaned.</span>
- <span Style="color:#9d0f0f">I first checked for missing values, and found that there were none. </span>
- <span Style="color:#9d0f0f">I then checked the coding of the data and variables, and changed the following</span>
- <span Style="color:#9d0f0f">Reginol Node Positive -> Regional Node Positive (spelling error), however I am not using this variable so it is not a concern in my data set.</span>
- <span Style="color:#c73a3a">I recoded the “Grade” data as follows:
- “1” -> Grade I
- “2” -> Grade II
- “3” -> Grade III
- “Anaplastic; Grade IV” -> Grade IV</span>
- <span Style="color:#9d0f0f">I also checked the distributions of all the variables for outliers, and while there were some, they were not out of the realm of possibility and thus were included in the data.</span>
Column {data-width=500}
---
### <span Style="color:#b4005d">Data Table</span>
```{r show_table}
datatable(data[1:50,], rownames=FALSE)
```
Summary Statistics
===
Column {.tabset data-width=500}
---
### Age
```{r age_hist}
ggplot(data, aes(x=Age))+geom_histogram(fill="pink", color="black")+labs(title="Distribution of Age", x=" Patient Age", y="Count")+geom_vline(xintercept=54)+geom_text(aes(x=54,y=300,label="Median Age, 54"))
```
### Grade
```{r}
ggplot(data, aes(x=Grade))+geom_bar(fill="#ff5285", color="black")+labs(title="Distribution of Tumor Grade", x="Tumor Grade", y="Count")+geom_text(aes(x="Grade I",y=650,label="543"))+geom_text(aes(x="Grade II", y=2450,label="2351"))+geom_text(aes(x="Grade III", y=1250,label="1111"))+geom_text(aes(x="Grade IV", y=80, label="19"))
```
### Tumor Size
```{r size_hist}
ggplot(data, aes(x=Tumor.Size))+geom_histogram(fill="pink", color="black")+labs(title="Distribution of Tumor Size", x="Tumor Size (mm)", y="Count")+geom_vline(xintercept=25)+geom_text(aes(x=26,y=500,label="Median Tumor Size, 25"))
```
### Estrogen Status
```{r}
ggplot(data, aes(x=Estrogen.Status))+geom_bar(fill="#ff5285", color="black")+labs(title="Distribution of Estrogen Status", x="Estrogen Status", y="Count")+geom_text(aes(x="Negative", y=450, label="269"))+geom_text(aes(x="Positive", y=3900, label="3755"))
```
### Progesterone Status
```{r}
ggplot(data, aes(x=Progesterone.Status))+geom_bar(fill="pink",color="black")+labs(title="Distribution of Progesterone Status", x="Progesterone Status", y="Count")+geom_text(aes(x="Negative", y=800, label="698"))+geom_text(aes(x="Positive", y=3420, label="3326"))
```
### Survival Months
```{r}
ggplot(data, aes(x=Survival.Months))+geom_histogram(fill="#ff5285", color="black")+labs(title="Distribution of Survival Months", x="Survival Months", y="Count")+geom_vline(xintercept=73)+geom_text(aes(x=73,y=265,label="Median Survival Months, 73"))
```
### Status
```{r}
ggplot(data, aes(x=Status))+geom_bar(fill="pink", color="black")+labs(title="Distribution of Status", x="Status", y="Count")+geom_text(aes(x="Alive", y=3550, label="3408"))+geom_text(aes(x="Dead", y=700, label="616"))
```
Column {data-width=500}
---
<b>Age Analysis</b>
- <span Style="color:red">The distribution of the age variable looks skewed left.</span>
- <span Style="color:red">There is a large spread of this data</span>
- <span Style="color:red">The median age of diagnosis is 54, the average 53.9</span>
<b>Grade Analysis</b>
- <span Style="color:red"> Breast cancer tumors are most frequently diagnosed at Grade II</span>
- <span Style="color:red"> Breast cancer tumors are most infrequently diagnosed at Grade IV, which makes sense as symptoms would likely lead to them being diagnosed sooner.</span>
<b>Tumor Size Analysis</b>
- <span Style="color:red">The distribution of breast cancer tumor size is skewed right.</span>
- <span Style="color:red">The median breast cancer tumor size is 25 mm, and the average is 30.47 mm.</span>
- <span Style="color:red">There is a wide spread even though most of the data is concentrated around the center.</span>
<b>Estrogen Status Analysis</b>
- <span Style="color:red"> Breast cancer tumors are most frequently found to have Estrogen Receptors, with very few negative for them.</span>
<b>Progesterone Status Analysis</b>
- <span Style="color:red"> Similarly, breast cancer tumors are most frequently found to have Progesterone Receptors, with very few negative for them.</span>
<b>Survival Months Analysis</b>
- <span Style="color:red">The distribution of survival months is slightly skewed left</span>
- <span Style="color:red">The median number of survival months is 73, and the average is 71.3.</span>
- <span Style="color:red">There is a wide spread of data.</span>
<b>Status Analysis</b>
- <span Style="color:red"> It was most frequently observed that patients would survive throughout the length of data collection.</span>
Patient Status
===
Column {.tabset data-width=500}
---
### Age & SM
```{r}
ggplot(data, aes(x=Age, y=Survival.Months, color=Status))+geom_point()+geom_smooth(se=FALSE)+labs(title="Distribution of Survival Months by Age", x="Age", y="Survival Months")
```
### Status & Age
```{r}
ggplot(data, aes(x=Status, y=Age))+geom_boxplot(fill="pink")+labs(title="Distribution of Survival Months by Status", x="Status", y="Age")
```
### ER & Status
```{r}
ggplot(data, aes(x=Estrogen.Status, fill=Status))+geom_bar(position="fill")+labs(title="Patient Status by Estrogen Receptor Presence", x="Estrogen Receptor Presence", y="Patient Status")
```
### PR & Status
```{r}
ggplot(data, aes(x=Progesterone.Status, fill=Status))+geom_bar(position="fill")+labs(title="Patient Status by Progesterone Receptor Presence", x="Progesterone Receptor Presence", y="Patient Status")
```
### ER Insights
```{r}
combined1 <- interaction(data$Status, data$Estrogen.Status)
ggplot(data, aes(x=combined1, y=Survival.Months))+geom_boxplot(fill="#ff5285")+labs(title="Distribution of Survival Months by ER Presence and Patient Status", x="Estrogen Receptor Presence and Patient Status", y="Survival Months")
```
### PR Insights
```{r}
combined2 <- interaction(data$Status, data$Progesterone.Status)
ggplot(data, aes(x=combined2, y=Survival.Months))+geom_boxplot(fill="pink")+labs(title="Distribution of Survival Months by PR Presence and Patient Status", x="Progesterone Receptor Presence and Patient Status", y="Survival Months")
```
Column {data-width=500}
---
- There does not seem to be a relationship between Age and Survival Months. Patients who lived through this study have higher average survival months compared to those who died, but this is not surprising.
- While the average and median ages of patients who died is slightly higher than those who lived, it is not by a significant amount.
- A higher percentage of patients whose cancer is negative for estrogen receptors died during the length of this study compared to patients who are positive for estrogen receptors.
- We see the same pattern amongst patients negative for progesterone receptors.
- Patients who were positive for estrogen receptors had considerably higher average and median survival months compared to those who were negative for them. We see this same pattern for progesterone receptors.
- The most interesting thing to note here is that among the patients who died, those were were positive for estrogen receptors have a higher median amount of survival months compared to those who are negative for them. Those who lived have comparable medians for both +/- estrogen receptors.
- We see the same pattern here as we did for the estrogen receptors.
Cancer Morphology
===
Column {.tabset data-width=650}
---
### Grade & TS
```{r}
ggplot(data, aes(x=Grade, y=Tumor.Size))+geom_boxplot(fill="pink")+labs(title="Distribution of Tumor Size by Tumor Grade", x="Tumor Grade", y="Tumor Size (mm)")
```
### ER Insights
```{r}
ggplot(data, aes(x=Estrogen.Status, y=Tumor.Size))+geom_boxplot(fill="#ff5285")+labs(title="Distribution of Tumor Size by ER Presence", x="ER Presence", y="Tumor Size (mm)")
```
### PR Insights
```{r}
ggplot(data, aes(x=Progesterone.Status, y=Tumor.Size))+geom_boxplot(fill="pink")+labs(title="Distribution of Tumor Size by PR Presence", x="PR Presence", y="Tumor Size (mm)")
```
### Age & Cancer Morphology
```{r}
ggplot(data, aes(x=Age, y=Tumor.Size, color=Grade))+geom_point()+geom_smooth(se=FALSE)+labs(title="Distribution of Tumor Size by Age and Grade",x="Age", y="Tumor Size (mm)")
```
### ER & Grade
```{r}
ggplot(data, aes(x=Estrogen.Status, fill=Grade))+geom_bar(position="fill")+labs(title="Distribution of Tumor Grade by ER Presence", x="ER Presence", y="%")
```
### PR & Grade
```{r}
ggplot(data, aes(x=Progesterone.Status, fill=Grade))+geom_bar(position="fill")+labs(title="Distribution of Tumor Grade by PR Presence", x="PR Presence", y="%")
```
Column {data-width=350}
---
### Analysis
- Tumor size increases with cancer grade.
- Tumors that are - for estrogen or progesterone receptors have larger average and median tumor sizes compared to those that are +.
- There seems to be no relationship between tumor size and age.
- Tumors that are - for estrogen receptors are more frequently higher grade tumors. This same pattern is seen for progesterone receptors.
Patients & Cancer
===
Column {.tabset data-width=650}
---
### Patients -ER
```{r}
ggplot(data, aes(x=Status, y=Tumor.Size))+geom_boxplot(fill="#ff5285")+facet_grid(~Estrogen.Status)+labs(title="Distribution of Tumor Size by ER Presence and Status", x="Status", y="Tumor Size (mm)")
```
### Patients -PR
```{r}
ggplot(data, aes(x=Status, y=Tumor.Size))+geom_boxplot(fill="pink")+facet_grid(~Progesterone.Status)+labs(title="Distribution of Tumor Size by PR Presence and Status", x="Status", y="Tumor Size (mm)")
```
### Double -
```{r}
combined3 <- interaction(data$Progesterone.Status, data$Estrogen.Status)
ggplot(data, aes(x=combined3, fill=Status))+geom_bar(position="fill")+labs(title="Distribution of Patient Status by HR Status", x="HR Presence (P/E)", y="%")
```
### Double -
```{r}
ggplot(data, aes(x=combined3, fill=Grade))+geom_bar(position="fill")+labs(x="HR Presence (P/E)", title="Distribution of Cancer Grade by HR Status", y="%")
```
### Tumor Grade and Patient Status
```{r}
ggplot(data, aes(x=Grade, fill=Status))+geom_bar(position="fill")+labs(title="Distribution of Patient Status by Tumor Grade", x="Tumor Grade")
```
Column {data-width=500}
---
### Analysis
<b>This sections aims to combine insights gained from each research question together for a more complete approach towards breast cancer in patients</b>
- There is a much higher disparity between tumor size amongst ER- patients compared to ER+. Patients who died have a higher median tumor size, and ER- patients who died had the highest median.
- We see this same pattern for PR- patients.
- When comparing both ER and PR among patients in relation to their status, we see that "double negative" ER-/PR- patients experience the highest frequency of death, patients negative for only one of the HRs are equally less frequent than the double negative, and the double positive patients experienced the lowest frequency of death.
- When comparing cancer status by patient HR statuses, we see that double negative patients have the highest incidency of higher grade cancers.
- Interestingly, ER-/PR+ patients have the highest incidency of Grade IV cancer
- I think this is due to random chance in the sample size(often, cancer is diagnosed before it reaches Grade IV).
- When we compare cancer grade to status, we see the frequency of death amongst patients increasing with each cancer grade.
Conclusions
===
Column {.tabset data-width=1000}
---
### Conclusions
To address the first research question, <b>How do individual factors such as age, estrogen status, and progesterone status relate to breast cancer, through the patient’s status and their survival after diagnosis?</b>
- We saw that age was not related to patient status or survival after diagnosis, and age is not a sufficient predictor of either.
- We saw that ER- and PR- patients showed similar results in the patient's well being, as HR- patients had a higher frequency of death compared to HR+ patients. Additionally, among patients who died during the study, we saw that HR- patients had a lower median amount of survival months compared to their HR+ counterparts.
To address the second research question, <b>How do these factors affect the individual’s cancer profile, tumor nodule size and tumor grade?</b>
- We saw that there was no relationship between age and either tumor size or grade. Thus, it is not a sufficient predictor of cancer morphology.
- We saw that HR- patients have larger median tumor sizes compared to their HR+ counterparts, and HR- patients were more frequently diagnosed with higher grade cancers. This connection makes sense, as higher cancer grade is associated with larger tumor sizes.
### Implications
When connecting these ideas, we saw that double negative patients were almost 4x as likely to have died during the study compared to the double positive patients.
<b>In conclusion, Age is not a sufficient predictor of cancer morphology or patient longevity, but both Estrogen and Progesterone receptor presence is related to both cancer morphology and patient longevity.</b>
- The results we saw supporting these conclusions make sense, as it is accepted throughout the field of oncology and cancer research that tumors that are negative for hormone receptors, like estrogen and progesterone, are more dangerous and aggressive cancers. This is because they tend to grow and spread more rapidly, and in addition they are resistant to treatments such as hormone therapies that are effective for HR+ cancers.
- The increase in fatality we saw for the double negative patients also is consistent with studies around breast cancer, as it is known the most dangerous type of breast cancer is the triple negative variety, ER-,PR-,HER2- (HER2 being a protein that regulates cell growth and division).
### Limitations and Future Directions
- This study did not address the presence of HER2, which would be a good additional direction to gain insights into how many of these patients were triple negative rather than double negative, and to see how that difference would impact their prognosis as well as cancer morphology.
- This study did not address treatments the patients may have been receiving, which could've made a difference in their survival months or final status. A good future direction for this study would be to analyze these variables we've found in conjunction with their treatments to establish relationships there.
- Next, this dataset did not acknowledge sex of the patients which may be important in the manifestation of the disease. While breast cancer almost entirely affects female patients, male patients can also be afflicted with breast cancer. Although a small sample size, this would be interesting to analyze in conjuction with the variables analyzed in this study.
- Finally, I would be interested in seeing a genetic analysis dataset behind breast cancer patients to understand what mutations are causing different kinds of breast cancer.
- For instance, is <i>BRCA1</i> more closely related to triple negative breast cancer than <i>BRCA2</i>?
- What region of these genes (exons, introns, regulatory sequence, etc?) are most commonly mutated, and how do different mutations change implications surrounding breast cancer?
- Or further, how many breast cancer cases are associated with the patient having Li Fraumeni Syndrome?
About the Author
===
Column {data-width=500}
---
### Background
My name is Ashley Williams and I am an undergraduate student attending the University of Dayton. I am majoring in Biology and I am minoring in Chemistry, Data Analytics, Neuroscience, and Research in the Biological Sciences. My anticipated graduation is in May of 2027.
I am an undergraduate researcher and have co-authorship of a [peer-reviewed scientific paper](https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1010653). I conduct my research in the [Williams Lab](https://thetomwilliamslab.com/), where I specifically study the regulation of the <i>Drosophila melanogaster pale</i> gene, and its origin during the evolution of a dimorphic pigmentation trait. I have been heavily involved in scientific research since 2021, and I have also presented my research on numerous occassions including twice at the University of Dayton's <span Style="color:#cf311e">Stander Symposium</span>, at <span Style="color:#267c28">the Society for Developmental Biology's 83rd Annual Meeting</span>, and this summer I will be at <span Style="color:#0066b6">the American Society for Biochemistry and Molecular Biology's conference, "Evolution and core processes in gene regulation"</span>.
I am interested in pursuing a Ph.D. in the field of genetics after my graduation, and continuing my career in academia and biological research.
I additionally have a vested interest in marine biology and while I do not intend to pursue career options in this field, I do enjoy environmental activism and spending my free time (and money) scuba diving and freediving.
Column {data-width=500}
---
### Presenting
```{r,fig.width=6, echo=FALSE, fig.align='right'}
knitr::include_graphics("IMG_6751.jpeg")
```
### Diving
```{r, fig.width=6, echo=FALSE,fig.align='left'}
knitr::include_graphics("IMG_6752.jpeg")
```